# Charge Sharing Tolerant Domino With Contention Current Partitioning For Wide Fan-In OR Logic Gates

Sourabh Yadav<sup>1</sup>, Rahul Gupta<sup>2</sup>, Manjeet Singh Sonwani<sup>3</sup>, Chetna Sinha<sup>4</sup>, Sanjay Kumar Dewangan<sup>5</sup>

<sup>1, 2</sup>Department of Electronics & Telecommunication, GEC Bilaspur, India.

<sup>3, 4</sup>Department of Electronics & Telecommunication, GEC Raipur, India.

<sup>5</sup>Department of Electrical Engineering, GEC Bilaspur, India.

#### Abstract

This paper presents a charge-sharing tolerant domino circuit technique. The technique is composed of dual strategy to cope up with the charge sharing in domino circuits due to transistor loading at wide fan in and due to the current contention at keeper stage respectively. The proposed Charge Sharing Tolerant Domino or CSTD works primarily to engage in isolating the drain circuit from the short circuit current, thereby avoiding the leakage-related losses and charge sharing of the domino circuit. Secondly, CSTD uses a modified keeper circuit to reduce the contention current at keeper stage. The proposed circuit has been studied and performance optimized. CSTD is compared with the standard footer and footless domino, and other latest domino circuit techniques. CSTD results in 51.2% reduced power consumption as compared to SFD and 18.9% as compared to the latest LPSD circuit technique, at 64 bit fan-in. The noise metric as determined by Average Noise Immunity has been substantially improved by 189% as compared to SFD and 37% as compared to LPSD. Simulation environment was kept at 90nm NMOS and PMOS models, at 500MHz.

## 1. Introduction

The dynamic logic circuit is a next-generation logic style that uses the circuit's inherent parasitic capacitance to provide the output voltage without the necessity of keeping a low impedance path from output to voltage supply or ground always [1-6]. Such measures result in low power consumption in dynamic logic. Low power consumption reduces the requirement of large heat sinks, reduces cost of system, allows for more optimized battery life and fuels technological advancements further [7-12]. Thus dynamic circuits become a perfect choice for such low power applications.

Dynamic logic is faster due to low input capacitance and no contention during logic transitions. Moreover, it allows circuit designers to construct efficient circuit in less chip area and provides much better transistor sizing optimizations. With all these benefits, dynamic logic is the first choice for any application that mainly requires high speed and low power such as high speed data paths, arithmetic functions, Manchester carry chains, decoders in on-chip SRAM arrays, flash memories, PLAs, Multiplexers etc. However dynamic circuits suffer from some limitations too.

The total power consumption in a logic circuit is a function of switching activity, capacitance, voltage and the structure and design of MOS transistor. In order to target specific design areas for specific reduction in power losses, total power is divided into 3 components [13] as given below.

$$P_{\text{total}} = P_{\text{switching}} + P_{\text{short-circuit}} + P_{\text{leakage}}$$
(1)

Here  $P_{switching}$  is the power consumed during switching between voltage levels. It is dissipated due to charging and discharging of internal and net capacitances. denotes power consumed due to direct conductance path between voltage and ground during switching. And  $P_{leakage}$  is sum total of all leakage current losses in circuit during any steady voltage state.

$$P_{\text{switching}} = \alpha C_{\text{eff}} V_{\text{dd}} V_{\text{swing}} f$$
<sup>(2)</sup>

In the above equation,  $\alpha$  represents the switching activity of logic operation. C<sub>eff</sub> is the effective capacitance during that switching activity. f is the switching frequency. V<sub>dd</sub> is the supply voltage. And V<sub>swing</sub> is the output voltage swing of the logic gate.

$$P_{\text{short-circuit}} = I_{\text{sc}} V_{\text{dd}} f \tag{3}$$

This term represents the short circuit power consumption. Dynamic gates provide a better advantage for keeping this component low by allowing this connection for least time possible. It can be further improved by optimized clock operation and voltage supply scaling.

The last component  $P_{leakage}$  is a function of voltage supply, switching threshold voltage and transistor size. This is why it depends largely on the technology being used. This is called leakage power consumption. Ideally it should be zero, but in any realistic circuit, there is always a small leakage current flowing mainly due to diode reverse bias current, sub threshold current, gate induced drain leakage and gate oxide leakage. Of all contributors, subthreshold current is the major contributor to this leakage power consumption. This is also called static power while the other 2 components together are termed as dynamic power.

With technology downscaling up to 45nm, dynamic power contributes the major component because of high capacitance induced power losses while leakage is considerably less [14]. The further we go to deeper nodes beyond 45nm, dynamic power consumption is lowered due to small size components, static power consumption however becomes the major contributor to total power consumption. This is because subthreshold leakage current rises exponentially with respect to the reduced threshold voltage as shown in (4). Therefore depending on the technology being operated, circuit techniques and strategies need to be employed. This paper focuses on dynamic power reduction via novel circuit technique using 90nm technology models for transistors.

$$I_{sub_th} = I_0 (1 - e^{\frac{V_{DS}}{V_t}}) e^{(\frac{V_{GS} - V_{TH} + \eta V_{DS}}{\eta V_t})}$$
(4)

The dependence on threshold voltage is shown above. It is highly affected by transistor size which is why sizing optimizations are important to reduce such leakage current.

$$I_0 = \mu_0 C_{ox} \frac{W}{L} (n-1) V_t^2$$
(5)

 $V_t$  is thermal voltage,  $V_{GS}$  is drain to source voltage,  $V_{GS}$  is gate to source voltage and  $V_{TH}$  is threshold voltage as given in (6).  $\eta$  is coefficient for drain induced barrier lowering. W/L is the aspect ratio of transistor. n is the subthreshold swing coefficient.  $\mu_0$  denotes mobility at no bias voltage and  $C_{ox}$  is gate oxide capacitance.

$$V_{\rm TH} = V_{\rm T0} + \gamma (\sqrt{\varphi_{\rm S} + V_{\rm sb}} - \sqrt{\varphi_{\rm S}}) \tag{6}$$

Here  $V_{TH}$  is threshold voltage at zero body bias.  $V_{T0}$  is the body bias factor.  $\varphi_S$  is the potential needed to form inversion layer. And  $V_{sb}$  is the transistor bias between source to body. This induces the body bias effect where  $V_{TH}$  is increased because it adds reverse bias across p type body and n type channel boundary for NMOS, thereby increasing the bulk depletion charge.

In this paper, a novel circuit technique is proposed for high performance domino logic gates using wide fan in OR gates to implement the technique. Wide fan in gates are employed in most of the high speed applications and carry the high input capacitance issue thereby resulting in high power consumption and degraded performance. We use dual keeper based multi stage difference amplifier for wide fan in gates to reduce power consumption and improve the performance of the circuit, at the expense of manageable increase in chip area. Since dynamic circuits already operate in much smaller chip areas, such design can be optimized and used for more efficient logic gate operations. The rest of the paper is organized as follows. Section 2 shows the literature survey of relevant circuit technique and section 4 provides the results and discussion for the same. Finally, section 5 concludes the entire evaluation of results and discusses future use of aforementioned technique. The simulations are done in 90nm CMOS technology at 110. Supply voltage used is 1 volt and comparisons have been done for 8 bit, 16 bit, 32 bit and 64 bit wide fan in OR gates.

#### 2. Literature Survey

Conventional domino circuits are discussed in this section. The dynamic circuits provided its advantage of speed because it removed the redundant circuitry from pull up and pull down networks, without affecting the current in the circuit. This usually means there will be a precharge and evaluate transistor in the circuit. This forms the early design for standard footed domino [15], as shown in Fig. 1. However conventional domino circuits differ from the practical domino circuits in some aspects. One of the primary changes, that is often played out in domino implementation is the use of

footless domino [16], presented in Fig. 2. With the added advantage of higher speed, it also allows more logic per gate. It is much more applicable in compound domino gates [17].



Fig. 1 Domino logic based OR gate circuit in standard footed configuration or SFD

Domino circuits work in 2 states as precharge and evaluate. Evaluation state concerns with the logic gate operation where output responds to the input combination thereby making this state more vulnerable to leakage current. If the input combination is low, still the leakage persists. This has been a major area of concern for domino circuits [18]. Leakage current consists of 5 major sources while other sources have only marginal effect on leakage. These are subthreshold leakage, gate oxide tunneling leakage, reverse bias junction leakage, gate induced drain leakage and the gate current as a result of hot carrier injection. Due to such inherent leakage current and the dynamic design of domino circuit, 2 major problems persist as charge leakage and charge sharing. It results in degradation of output voltage before the stipulated hold time for the logic operation. This not only results in reduced accuracy but also affects noise immunity. This does not pose a major problem at normal operating frequencies, but in circuits where clock may stop and go in standby mode for some time to preserve power, this becomes crucial to maintain dynamic node voltage at high. For circuits with wide fan in, due to many parallel leakage operation, especially if they are being used in cascade configuration.

In order to tackle these issues, a keeper transistor is used at dynamic node to hold that output charge during evaluate phase in that particular logic. This transistor can be run by the output if it is an inverter or else a separate inverter will be required for it. A weak PMOS keeper is used whose purpose is to supply current and maintain the voltage high on dynamic node such that the gate does not flip erroneously. Therefore, the charge sharing problem does not affect the circuit operation and robustness of the circuit is improved too. However, addition of keeper transistor gives rise to a contention current that opposes the evaluation current of NMOS transistors in PDN. It typically

increases the delay of the circuit by 4-6%. To manage this delay, keeper sizing optimizations become important. But the contention current also leads to increase in dynamic power losses. Due to the dynamic nodes, not driven or weakly driven, domino is sensitive to noise coupled to inputs and internal nodes. And thus with keeper addition, noise immunity gets affected too. Since domino circuits are very high speed already, this degradation does not impact its performance as compared to static logic circuits. However as compared to the conventional domino design, the trade off situation arises between delay and the circuit's ability to fight voltage fluctuations due to noise sources.



Fig. 2 Domino logic based OR gate circuit in standard footer less configuration or SFLD

The keeper sizing optimizations are better understood by analyzing the keeper ratio of the circuit. This effectively relates the current output for keeper transistor to PDN transistors.

$$K = \frac{\mu_{P}(\frac{W}{L})_{Keeper-transistor}}{\mu_{n}(\frac{W}{L})_{Evaluation-network}}$$
(7)

Here W denotes the width of the transistor. L denotes the length of the transistor. and are hole and electron mobilities, respectively. Conventionally the keeper size is kept as low as possible and at least a factor of 4-10 smaller than pull down transistors in PDN. As the keeper ratio is increased by using bigger and bigger PMOS transistor, the noise immunity improves whereas it also increases power consumption and degrades the speed of the circuit too. The wide fan in circuits consists of many parallel paths, thus increasing the input capacitance of the circuit. This further alleviates the issue of contention current during evaluation. This is why such keeper sizing optimizations does not prove beneficial for upcoming generations of domino circuits. And with growing number of inputs as 8, 16, 32 and 64, the performance of circuit seems to degrade in a non- linearly decreasing fashion [19]. So the situation seems acceptable at low inputs, but it becomes a major trade off for higher inputs. And we need novel ways to improve the speed and performance of such circuits.

Most circuit techniques that are aimed at improving these issues are focused at modifying the topologies of keeper control circuit [20-23]. The rest of these techniques are focused at modifying the design of evaluation network, varying the threshold voltage of PDN and managing the role of

pull up network or PUN in precharging the internal node [24-32]. Basically one aims primarily at managing the keeper activation, other aims at directly managing the current flow and current contention by other measures. We will see the latest techniques being used for the same.

In order to manage the leakage current, CKCCD uses current comparison strategy in evaluation network. This idea was proposed in controlled keeper current comparison domino or CKCCD [26]. It is shown in Fig. 3(a) and 3(b). Its advantage is improvement in noise immunity and reduced power consumption while working in a footed domino configuration. It does so by controlling the keeper at necessary times in a logic operation and therefore reducing contention in the process.



**Fig. 3** (a) Domino logic based OR gate circuit in Controlled keeper current comparison configuration or CKCCD and its (b) Reference block

Current comparison domino [27] performs its operation using the same basic strategy of comparing the currents to reduce contention. In this technique the comparison is done between mirror current from PUN and the current in reference circuit. It gives lesser power consumption. Due to footer transistor, much better control is observed and therefore robustness of the circuit is improved along with noise immunity.

Delayed Feedback Domino [27] uses an inverter driven feedback to the input supply which signals the input nodes to activate during precharge phase and signals it to allow current flow during evaluation phase. This technique uses output capacitance to drive the input circuitry, which becomes a problem if more than 3 output stages are driven from this domino circuit. The current reduction would lead to error in bit synchronization throughout the domino chain. In addition to the possibility of bit error at high output load, this circuit also suffers from a possible glitch issue in case all inputs provide a bit LOW. In this case, input network would by default discharge all the dynamic current through PDN, but it will still drive output node to saturation due to the delayed feedback signal being provided to the input via the output stage. Due to this a glitch might occur where dynamic node will give a high value, even when all the inputs show a low bit value.

Low Power Stacked Domino (LPSD) is presented in [29]. It relies on stacked output node via 3 transistors used in varying width configuration. The keeper activates the input circuit, thereby signaling and controlling the precharge phase keeper. The circuit is shown in Fig. 3. Circuits that draw signal from output in order to control the input precharge phase, general face stability issues, which is a limiting constraint of this circuit too. As per the frequency of 500MHz, this circuit will not be able to operate reliably if noise fluctuation cause current spikes, as it relies strongly on the timing of NMOS transistor getting turned OFF at the output node, while at the same time, pull down network should get activated, in order to signal the input supply correctly. Other than this, the circuit provides considerable power reduction.



Fig. 4 Domino logic based OR gate circuit in Low Power Stacked Domino

Analysis of these techniques highlight the need for a novel technique to minimize the leakage losses due to direct path between supply to ground while circuit operation, and this should not affect the cascade operation of the circuits. SFLD design use direct input to output voltage transfer thereby leading to power losses. While LPSD reduces such leakage via partitioning the input capacitance, it still is not able to reduce the output voltage swing due to clock driven transistors. Both these concerns are met by CCD and CKCCD which use an isolated input to output circuitry and use an independent operation for driving transistors. But both CCD and CKCCD use precise matching circuits for its operation which is a problem for battery operated circuits that engage in long standby modes thereby putting the logic value of output at risk. Therefore we need a novel domino circuit design that results in lesser power loss and better reliability at given supply voltages and process corners of the circuit. Along with this, it should also be able to operate at high speed and better noise tolerance.

# 3. Proposed Circuit

The proposed design is charge sharing tolerant domino (CSTD). This circuit's operation will be detailed with respect to its input states, as per the truth table of OR gate, when all inputs or any input is high, and when all the input supply voltage is 0 aka low voltage. The primary aim of the design is to reduce the output voltage swing, which in turn will lead to lowered dynamic power

consumption. This voltage curve can be lowered at the output end as well as the input end, but there are limits that need to be taken care of. The difference between 2 voltages must also reach a certain level in order for the circuit to work properly.

The proposed circuit is shown in Fig. 5. The 1<sup>st</sup> stage contains pull up and pull down transistors along with the input evaluation network of N transistors. Clock is applied to Mpre and Meva to precharge the circuit and use it in evaluation mode for next cycle. 5 transistors make up for the 2<sup>nd</sup> stage of isolation. This stage operates at certain difference voltage as fed from nodes N1 and N2. Finally the output or 3<sup>rd</sup> stage contains a skewed inverter to give the final output of the circuit.



Fig. 5 Charge Sharing Tolerant Domino logic based OR gate circuit

The circuit operates as OR gate as shown in Fig. 6. For voltages applied at input 1 and 2, aka transistor M-in1 and M-in2 are high while others are low at the input stage. For this combination, and for a 50% duty cycle of evaluation and precharge phase, the circuit performs as expected by an OR gate. When both inputs 1 and 2 are HIGH, the output signals HIGH as given in the timing diagram. Dynamic node works optimally signaling the inverted signal and driving the output stage according to it.



Fig. 6 Timing diagram analysis for operation of the proposed Chare Sharing Tolerant Domino circuit.

One of the prominent features in all domino logic designs as shown in [24-32] is the clock driven transistors in pull up part of the circuit. It is supposed to prevent overlapping signals in between precharge and evaluate modes. Such technique makes the circuit more robust, however such circuits perform efficiently under 100-300MHz. Domino circuits at 90nm node or below, operate at 500MHz to 1.5 GHz. Due to this, the output has to drive the pull up transistor. It creates a feasible scenario between circuit's power consumption and speed of operation. An output driven transistor, however lowers the drive current. When we are operating domino in cascaded stages, this becomes a problematic issue, as it puts the next stage's state on a high risk of flipping its output value. This issue has been rectified by using 5 transistors in a loop configuration in 2<sup>nd</sup> stage. The loop only allows the charge to not leak, rather stay in the circuit and drive other gates from it. As shown in Fig. 7, the charge sharing scheme of this circuit focuses majorly on dynamic node driven footer transistor and its synchronization with M2 and M4, for precharge and evaluate mode respectively.



Fig. 7 Prevention of Charge Sharing via internal looping in CSTD

Transistor widths have to be managed precisely in order to make sure that the contention current is least, meanwhile not delaying the circuit further more. M1, M2 and M5 controls the circuit's switching operations between logic HIGH and logic LOW voltage. When any one of the input is high, the output is meant to be logic HIGH. For input HIGH operation, M3 gets turned OFF due to the input being held at dynamic voltage, while N2 is at low voltage thereby turning off M4 too. This leads to stage progression to the next stage without any voltage being affected by stage 1. With N2 low, Minv-p is turned OFF thereby putting the voltage at N3 at low level. This results in output being HIGH. For input Low voltage, the N1 would be HIGH, thereby turning off M4 and subsequently the next stage. This provides keeper to flow its current to hold its voltage at pre output to high, thereby leading to low voltage at the output. M1, M2 and M3 widths are kept low to manage less power loss due to charging current at just 2.5 times the length, while M4 and M5 are kept high so that, the voltage is held strongly at low level at the output.

The keeper used at output works to improve the capacitive signaling by increasing the current flow at node N3. This would not have been achieved if a small width transistor is used here, which is why they width of M5 is at least 4 times the length of the transistor as shown in (8).

$$I_{Dp} = K'_{p}(\frac{W}{L}) \int_{V=0}^{V=V_{SDp}} [V_{SGp} - |V_{Tp}| - V(y)] dV$$
(8)

 $I_{Dp}$  stands for drain charge current.  $K'_p$  stands for the process transconductance.  $V_{SDp}$  stands for voltage between source and drain.  $V_{SGp}$  stands for voltage between gate and source. Other than these,  $V_{Tp}$  and V(y) represent threshold and differential voltage. Differential is measured along the length of the MOSFET.

Fig. 7 shows the scheme with which charge sharing has been eliminated. The internal looping mechanism provides a loop gain T as shown in equation 9. This gain allows for a current drive at output, higher than the input charging current, leading to a strong output signal as well as a higher current driving M3 and M4 in 2<sup>nd</sup> stage. In such circumstances, charge sharing gets strongly prevented, due to the 2<sup>nd</sup> stage being cut off the input or output, during the mid evaluate cycle and only acts to increase the drive current as per the gain formula given by T.

$$T = A_{in}g_{mK}Z_X$$
(9)

The dual strategy of CSTD allows it to manage a loop gain based internal charge sharing prevention, leading to a high transconductance given by  $G_{m,eff}$ , as shown in equation 10, thereby creating a low resistance path as given in equation 11.

$$G_{m,eff} = \frac{G_{mK1}}{(1+G_{mK1})R}$$
(10)





Fig. 8 Contention Current Partitioning throughout precharge and evaluation mode in CSTD

Secondly, this strategy also uses contention current partitioning scheme to isolate 1<sup>st</sup> stage from 3<sup>rd</sup> stage. This is shown in Fig 8. Due to the internal looping of the circuit, certain delay is introduced inherently in this circuit during the evaluation phase, but it is compensated by the higher drive current that is achieved at the output end, and current-less comparison of voltages at N1 and N2. This leads to a lowered power consumption of the entire circuit, as well as increased noise tolerance of the circuit, especially the noise generated at the input nodes.

## 4. Design evaluation and discussion

The 4 most important performance metric for evaluating any domino logic based circuit are power dissipation, mostly dynamic power dissipation is being focused for dynamic logic circuits, noise tolerance as per the various assessments of the same, size of the chip and finally the speed of the circuit [30]. Other performance parameters are also managed in this proposed circuit as repeatability, sophistication, stability of the circuit, etc.

Noise tolerance metric for the proposed circuit has been discussed in Table 1. High frequency operation has been focused here from few mega hertz to up to 1 GHz. This is done in order to provide the higher performance for the circuit, in terms of its functionality and overall operation too. Since the calculation of performance parameters should be done with actual input metrics, based on the standard rise time and fall time calculations, practical inputs have been provided to the circuit and output as well as other performance metrics have been observed for the same. All designs have been performed at same delay of td=70ps, in order to keep the uniformity of speed calculations and other calculations in the circuit.

The given calculations show that, there is a sharp 42% rise in noise tolerance metric in the proposed circuit as compared to the standard Footer based domino design, while a considerable 16% increase as compared to the latest current comparison domino design. Due to the charge sharing tolerant circuit working in evaluation mode, any signal that enters the circuit has to be stabilized first, before it can affect the output, which is why, noise immunity is higher. The middle stage or 2<sup>nd</sup> stage isolates input noise strongly, but does not impact much on the noise from other branches. However this paper focuses only on improving the noise metric for input noise immunity as it is the most important noise point in the entire circuit.

| Fan-in |     | SFLD | CKCCD | CCD  | DFD  | LPSD | CSTD |
|--------|-----|------|-------|------|------|------|------|
| 8      | ANI | 0.46 | 0.49  | 0.36 | 0.64 | 0.66 | 0.67 |
|        |     | 1    | 1.03  | 0.79 | 1.45 | 1.50 | 1.55 |
| 16     | ANI | 0.38 | 0.48  | 0.29 | 0.59 | 0.62 | 0.65 |
|        |     | 1    | 1.17  | 0.81 | 1.64 | 1.67 | 1.69 |
| 32     | ANI | 0.36 | 0.43  | 0.26 | 0.56 | 0.60 | 0.62 |
|        |     | 1    | 1.30  | 0.83 | 1.66 | 1.78 | 1.83 |
| 64     | ANI | 0.28 | 0.42  | 0.23 | 0.52 | 0.52 | 0.56 |
|        |     | 1    | 1.53  | 0.84 | 1.85 | 1.85 | 1.89 |

**Table 1:** Average Noise Immunity, absolute and normalized, for different techniques and different fan-ins are observed in this section. All techniques are compared for the same delay.

Power consumption has been calculated in table 2 for the proposed circuit as compared with the other existing designs for domino logic based OR gate circuits. Static power consumption has not been considered because it directly depends on the physical characteristics of the circuit and the leakage models for the circuit. This proposed domino logic based OR gate circuit focused on reducing dynamic power consumption, both with respect to the short circuit loss and with respect to the switching losses. With respect to the base circuit, the proposed design has acquired 46% reduction in dynamic power at 64 input combination and 47% reduction in dynamic power at 32 bit input.

| <b>Table 2:</b> Power consumption, absolute and normalized, for different techniques is given below. | All |
|------------------------------------------------------------------------------------------------------|-----|
| techniques are compared at the same delay at each input combination.                                 |     |

| Fan-in |   | SFLD | CKCCD | CCD  | DFD  | LPSD | CSTD |
|--------|---|------|-------|------|------|------|------|
| 8      | Р | 25.2 | 24.4  | 28.8 | 19.8 | 22.5 | 21.9 |
|        |   | 1    | 0.96  | 1.14 | 0.78 | 0.90 | 0.87 |
| 16     | Р | 29.2 | 27.3  | 34.1 | 23.4 | 22.2 | 20.6 |
|        |   | 1    | 0.93  | 1.14 | 0.77 | 0.73 | 0.71 |
| 32     | Р | 34.7 | 33.9  | 39.2 | 29.8 | 24.7 | 20.2 |

|    |   | 1    | 0.96 | 1.11 | 0.85 | 0.70 | 0.66 |
|----|---|------|------|------|------|------|------|
| 64 | Р | 43.8 | 39.8 | 47.4 | 37.6 | 27.2 | 25.1 |
|    |   | 1    | 0.90 | 1.07 | 0.84 | 0.62 | 0.52 |

Most common OR gate operations are performed at fan in of 32 and fan out of 2 to 4, which is why, 32 input fan in is being presented in table 3 for the complete comparison of performance parameters. Select applications require large number of gates to be cascaded with further more number of gates, which is why, a high fan in is important, not just at 32 inputs but rather it goes to 64 and 128 even, as seen in microprocessor, SRAMs, Multiplexers and other such applications. The table shows the performance metrics and this section shows the superior ones with their reasons for the same and the respective tradeoffs that are being expected in the industry when one of these techniques are to be selected for a particular application. CSTD provides a reduced area as compared to LPSD due to the removal of double keeper arrangement. It allows for a better circuit loop strategy because charge leakage is avoided and the entire charge loops within the circuit instead of dropping to the ground. This provides CSTD to reduce power consumption by 18.9% as compared to the latest circuit LPSD and 51.2% as compared to SFLD.

$$GPC = \frac{ANI_n}{P_n \times t_n^2 \times A_n}$$
(12)

The table 3 also shows a new parameter used for comparison: Gross Performance of the Circuit (GPC). This metric combines all 4 major parameters of the circuit with respect to high speed applications as has been aimed for this circuit in this paper like multiplexers, memory circuit, read write data paths and microprocessors. Time has been squared to show the essence of speed, and the generalized metric for evaluating delay of the entire chain will include square of this time. Other 3 parameters show the relevance of power, area and noise immunity required to form the complete final metric as shown in equation 12.

|                              | SFLD | CKCCD | CCD  | DFD  | LPSD | CSTD |
|------------------------------|------|-------|------|------|------|------|
| Total transistors            | 36   | 39    | 84   | 44   | 48   | 44   |
| Total Area on Chip           | 125  | 133   | 292  | 152  | 239  | 151  |
| Total Area on Chip<br>(Norm) | 1    | 1.08  | 2.29 | 1.18 | 1.89 | 1.21 |
| Power Dissipated             | 34.6 | 34.8  | 39.1 | 29.8 | 24.7 | 20.2 |
| Power Dissipated             | 1    | 0.97  | 1.11 | 0.89 | 0.71 | 0.66 |

**Table 3:** Comparison of performance parameters for different designs is shown below. All circuits operate with same delay of 90ps. Gate operation is 32 bit input OR gate.

| (Norm)                           |      |      |      |      |      |      |
|----------------------------------|------|------|------|------|------|------|
| Average Noise<br>Immunity        | 0.32 | 0.28 | 0.44 | 0.56 | 0.60 | 0.62 |
| Average Noise<br>Immunity (Norm) | 1    | 0.83 | 1.31 | 1.67 | 1.78 | 1.83 |
| Gross Performance of<br>Circuit  | 1    | 0.65 | 0.91 | 0.82 | 1.1  | 1.48 |

GPC of the circuit at different fan-ins have been presented in Fig. 9. It shows the importance of CSTD circuit at all high fan-ins except at 8 bit fan-in. The reason for high metric values is the removal of charge leakage via internal looping. However the looping only occurs as strongly as the input capacitance is provided to it, which is why, it only proved to be a moderate improvement at 8 bit but shows its importance at higher fan-ins.



**Fig. 9** GPC comparison for different techniques at 4 different fan-ins as 8 bit, 16 bit, 32 bit and 64 bit.

In order to check the circuit on a complete functionality timeline, it has to be checked for corner analysis as shown in Fig. 10 and 11. Front end uses 5 different corners mainly for analyzing the entire circuit based on the error margins provided in all those ends. Transistors dimensions are managed to make fast operation and slow operation to check for the operation success in those margins too, as TT, FF, FF, FS and SF. Temperature analysis has to be done on temperature corners, and since the application is used at microprocessors and multiplexers, the temperature range of -40 °C to 125 °C works properly around it. This comparison assumes delay to be the normalized at SFLD, so all other delays can be seen with respect to it. Some circuit designs rely heavily on this analysis to check the feasibility of their designs, especially if they are trying to cascade or use their designs with other designs in the same combined circuit. Voltage corners are checked at 1.1 and 0.9 volts in order to estimate the function of chip in those margins. CSTD shows enhanced performance in all those corners except at lower voltages, since the circuit relies on precise noise margins and provides higher noise immunity. Still it is well higher than SFLD showing the improved performance that is provided

by the proposed CSTD domino logic based OR gate circuit.



**Fig. 10** Corner analysis for proposed circuit CSTD with SFLD at varying temperature and front end process corners on normalized delay of the circuit.



Fig. 11 Corner analysis for proposed circuit CSTD with SFLD at varying temperature and voltage corners on normalized delay of the circuit.

# 5. Conclusion

This paper presents Charge Sharing Tolerant Domino Circuit with contention current partitioning as applied to OR gate logic circuits for 8, 16, 32 and 64 bit fan-in. The charge sharing problem has been an inherent issue of all domino circuits, and has recently impacted the circuit's performance more due to the scaling of technology to deeper nodes.

This paper has resolved the issue of charge sharing by considerable level, via 2 major strategies. It reduced the power consumption at the dynamic stage by using a CST scheme, which allows for the charge to quickly get used by the second circuit without getting discharged or shared, even if multiple stages are driven from it. Secondly, it uses the current partitioning scheme via disengaging the 2<sup>nd</sup> stage at the onset of new cycle, thereby making the current from input stage not directly aligned to

drive the 3<sup>rd</sup> stage aka the output stage. This type of isolation allows a weak keeper at the output to control the functioning, and drive multiple output stages from it too.

All MOSFET models used for comparison and proposed circuits were based on 90nm PTM BSIM Version 4 models. The proposed circuit can be used in Multiplexers, Decoders, Arithmetic Units, DRAM and SRAM respectively. In future it can be aligned with benchmark circuits to work directly in an FPGA unit for high functioning microprocessors and on-chip units.

## Bibliography

- [1] N.H.E. Weste, D.M. Harris, CMOS VLSI Design: A Circuits and Systems Perspective, Pearson/Addison- Wesley, Boston, 2010.
- [2] R.J. Baker, CMOS: Circuit Design, Layout, and Simulation, John Wiley & Sons, NJ, 2011.
- [3] V. Kursun, E.G. Friedman, Multi-Voltage CMOS Circuit Design, Wiley, New York, 2006.
- [4] M.H. Anis, M.W. Allam, M.I. Elmasry, Energy-efficient noise-tolerant dynamic styles for scaled-down CMOS and MTCMOS technologies, IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 10 (2) (2002) 71–78, DOI: 10.1109/92.994977.
- [5] H. Iwai, Roadmap for 22 nm and beyond, Microelectron. Eng. 86 (9) (2009) 1520–1528.
- [6] V. Kursun, E.G. Friedman, Domino logic with variable threshold voltage keeper, IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 11 (6) (2003) 1080–1093, DOI: 10.1109/TVLSI.2003.817515.
- [7] A. Alvandpour, R. Krishnamurthy, K. Sourrty, S.Y. Borkar, A sub-130 nm conditionalkeeper technique, IEEE J. Solid-State Circuits 37 (5) (2002) 633–638, DOI: 10.17485/ijst/2016/v9i22/90152.
- [8] Y. Sun, V. Kursun, Carbon nanotubes blowing new life into NP dynamic CMOS circuits, IEEE Trans. Circuits Syst. I 61 (2) (2014) 420–428 (February), DOI: 10.1109/TVLSI.2013.2268131.
- [9] A. Amirabadi, A. Afzali-Kusha, Y. Mortazavi, M. Nourani, Clock delayed domino logic with efficient variable threshold voltage keeper, IEEE Transactions on VLSI Systems 15 (2007) 125–134, DOI: 10.1109/TVLSI.2007.891097.
- [10] F. Haj Ali Asgari, M. Ahmadi, J. Wu, Low power high performance keeper technique for high fan-in dynamic gates, in: Proceedings of European Conference on Circuit Theory and Design (ECCTD), 2009, pp. 523–526, DOI: 10.1016/j.vlsi.2011.07.002.
- [11] S.O. Jung et al., Skew-tolerant high-speed (STHS) domino logic, in: Proceedings of ISCAS, vol. 4, 2001, pp. 154–157, DOI: 10.1109/ISCAS.2001/922195.
- [12] S.M. Sharroush et al., Speeding-up wide-fan in domino logic using a controlled strong PMOS keeper, in: Proceedings of the International Conference on Computer and Communication

Engineering, 2008, pp. 633–637, DOI: 10.1109/ICCCE.2008.4580681.

- [13] P. R. Gray, P. J. Hurst, S. H. Lewis, and R. G. Meyer, Analysis and design of analog integrated circuits, 4<sup>th</sup> ed. New York: Wiley, 2001.
- [14] J. M. Rabaey, A. Chandrakasan, and B. Nicolic, Digital Integrated Circuits: A Design Perspective, 2<sup>nd</sup> ed. Upper Saddle River, NJ: Prentice Hall, 2003.
- [15] R. Krambeck, C.M. Lee, H.F. Law, High-speed compact circuits with CMOS, IEEE J. Solid-State Circuits 17(1982)614–619, DOI: 10.1109/JSSC.1982.1051786.
- [16] M. Elgebaly, M. Sachdev, A leakage tolerant energy efficient wide domino circuit technique, in: Proceedings of the 45<sup>th</sup> Midwest Symposium on Circuits and Systems, MWSCAS-2002, IEEE, 2002, vol.481, pp.I-487–490, DOI: 10.1016/j.vlsi.2015.06.003.
- [17] F. Moradi, T. VuCao, E.I. Vatajelu, A. Peiravi, H. Mahmoodi, D.T. Wisland, Domino logic designs for high-performance and leakage-tolerant applications, Integration VLSI Journal 46(2013)247–254, DOI: 10.1016/j.vlsi.2012.04.005.
- [18] K. Roy, S. Mukhopadhyay, H. Mahmoodi-meimand, Leakage current mechanisms and leakage reduction techniques in deep-submicron CMOS circuits, Proceedings of the IEEE 91 (2003) 305–327, DOI: 10.1109/JPROC.2002.808156.
- [19] F. Haj Ali Asgari, M. Ahmadi, J. Wu, Low power high performance keeper technique for high fan-in dynamic gates, in: Proceedings of European Conference on Circuit Theory and Design (ECCTD), 2009, pp. 523–526, DOI: 10.1109/ECCTD.2009.5275046.
- [20] P. Zhao, et al., A Low power domino with differential-controlled-keeper, IEEE International Symposium on Circuits and Systems (ISCAS) (2007) 1625–1628, DOI: 10.1109/ISCAS.2007.378830.
- [21] J.R.G. David, N. Bhat, A low power, process invariant keeper for high speed dynamic logic circuits, IEEE International Symposium on Circuits and Systems (ISCAS) (2008) 1668– 1671, DOI: 10.1109/ISCAS.2008.4541756.
- [22] J. Wang, W. Wu, N. Gong, L. Hou, Domino gate with modified voltage keeper, in: Proceedings of 11th International Symposium on Quality Electronic Design, 2010, pp. 443– 446, DOI: 10.1109/ISQED.2010.5450538.
- [23] S.M. Sharroush et al., Speeding-up wide-fan in domino logic using a controlled strong PMOS keeper, in: Proceedings of the International Conference on Computer and Communication Engineering, 2008, pp. 633–637, DOI: 10.1109/ICCCE.2008.4580681.
- [24] L. Wang, R. Krishwamurthy, K. Soumyanath, N.R. Shanbhag, An energy-efficient leakagetolerant dynamic circuit technique, in: Proceedings of the 13<sup>th</sup> Annual IEEE International ASIC/SOC Conference, IEEE, 2000, pp. 221–225, DOI: 10.1109/ASIC.2000.880705.
- [25] H. Mahmoodi-Meimand, K. Roy, Diode-footed domino: a leakage-tolerant high fan-in

dynamic circuit design style, IEEE Trans. Circuits Syst. I: Regul. Pap. 51(2004)495–503, DOI: 10.1109/TCSI.2004.823665.

- [26] A. Peiravi, M. Asyaei, Robust low leakage controlled keeper by current-comparison domino for wide fan-in gates, Integr. VLSIJ. 45(2012)22–32, DOI: 10.1016/j.vlsi.2011.07.002.
- [27] A. Peiravi, M. Asyaei, Current-comparison-based domino: new low-leakage high-speed domino circuit for wide fan-in gates, IEEE Trans. Very Large Scale Integr. (VLSI) Syst.21(2013)934–943, DOI: 10.1109/TVLSI.2012.2202408.
- [28] S. Garg, S. Goyal, P. Nawale, R. Kaur and N. Pander, Leakage Tolerant Wide OR Domino Gate with Modified Keeper Controlling Network, 2019 IEEE 16<sup>th</sup> India Council International Conference (INDICON), 2019, DOI:10.1109/INDICON47234.2019.9030342.
- [29] U. Panwar, A. Shrivastava, A Novel Technique to improve Performance Evaluation of Domino Logic Circuits in CMOS and FinFET technology, 2<sup>nd</sup> International Conference on Data, Engineering and Applications (IDEA), 2020, DOI:10.1109/IDEA49133.2020.9170682.
- [30] Jan M. Rabaey, Massoud Pedram, Low power design methodologies, Kluwer Academic Publishers, 1996.
- [31] Banerjee K., Amerasekera A., and Hu C., Characterization of VLSI circuit interconnect heating and failure under ESD conditions, International Reliability Physics Symposium, 1996, DOI: 10.1109/RELPHY.1996.492126.
- [32] D. Sylvester and K. Keutzer, A Global Wiring Paradigm for deep submicron design, IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, Volume 19, Issue 1, Feb 2000, pp. 242-252, DOI: 10.1109/43.828553.
- [33] M. Alioto, G. Palumbo, and M. Pennisi, Understanding the effect of process variations on the delay of static and domino logic, IEEE Trans. Very Large Scale (VLSI) Syst., vol. 18, no. 5, pp. 697-710, May 2010, DOI: 10.1109/TVLSI.2009.2015455.
- [34] M. H. Anis, M. W. Allam, and M. I. Elmasry, Energy-efficient noise-tolerant dynamic styles for scaled- down CMOS and MTCMOS technologies, IEEE Trans. Very Large Scale (VLSI) Syst., vol. 10, no. 2, pp. 71- 78, Apr. 2002, DOI: 10.1109/92.994977.